
<HTML>

<HEAD>
<TITLE>Advanced HTTPClient Info</TITLE>
<LINK REV="made"     HREF="mailto:ronald@innovation.ch">
<LINK REL="Previous" HREF="getting_started.html">
</HEAD>

<BODY BGCOLOR="#FFFFFF">

<H1>Advanced HTTPClient Info</H1>


<H2>Contents</H2>

<UL>
<LI><A HREF="#proxies">Proxies</A>
<LI><A HREF="#timeouts">Timeouts</A>
<LI><A HREF="#out_streams">Output Streams</A>
<LI><A HREF="#contexts">Contexts</A>
<LI><A HREF="#pers_con">Persistent Connections (Keep-Alive's)</A>
<LI><A HREF="#close_con">Closing Connections</A>
<LI><A HREF="#pipelining">Pipelining</A>
<LI><A HREF="#conn_details">Relationship between HTTPConnection instances
    and TCP connections</A>
<LI><A HREF="#threading">Multithreading</A>
<LI><A HREF="#urlcon">HttpURLConnection</A>
<LI><A HREF="#prot_vers">Protocol Version</A>
<LI><A HREF="#modules">Modules</A>
<LI><A HREF="#properties">Java System Properties recognized by HTTPClient</A>
<LI><A HREF="#headers">HTTP Header Fields</A>
<LI><A HREF="#specs">HTTP (and related) Specifications</A>
</UL>


<H2><A NAME="proxies">Proxies</A></H2>

<P>Support for proxies (including SOCKS) is fully implemented. However,
using proxies in applets is subject to a number of security restrictions
(see <A HREF="security.html">security</A> for more information on the
various security policies and the consequences that arise from them). If
you are using an http proxy then use the
<A HREF="api/HTTPClient/HTTPConnection.html#setProxyServer(java.lang.String, int)">HTTPConnection.setProxyServer()</A>
method to set the default proxy for all new HTTPConnections;
<A HREF="api/HTTPClient/HTTPConnection.html#setCurrentProxy(java.lang.String, int)">HTTPConnection.setCurrentProxy()</A>
can be used to set a proxy for the current HTTPConnection only. You can also
manipulate a list of hosts for which no proxy is to be used with the
methods
<A HREF="api/HTTPClient/HTTPConnection.html#dontProxyFor(java.lang.String)">HTTPConnection.dontProxyFor()</A>
and
<A HREF="api/HTTPClient/HTTPConnection.html#doProxyFor(java.lang.String)">HTTPConnection.doProxyFor()</A>.
Example:
<PRE>
    HTTPConnection.setProxyServer("my.proxy.dom", 8008);
    HTTPConnection.dontProxyFor("localhost");
    HTTPConnection.dontProxyFor(".mycompany.com");
    AuthorizationInfo.addBasicAuthorization("my.proxy.dom", 8008, realm, user, passwd);
    ...
    HTTPConnection con = new HTTPConnection(...);
</PRE>
This will cause all connections to use the proxy (and automatically
authenticate with it) except when connecting to localhost or any other
machine within the company.

<P>If you are using SOCKS then the method to use is <A
HREF="api/HTTPClient/HTTPConnection.html#setSocksServer(java.lang.String)">setSocksServer()</A>.
Note that both an http proxy and a SOCKS proxy can be set at the same
time, in which case a request is sent via the SOCKS server to the proxy
server, which in turn relays the request to the desired destination.

<P>Some proxies will proxy for protocols other than http using http to
contact the proxy itself. If you have such a proxy then you can use the
HTTPClient to do requests for other protocols through the proxy. To do
this you need to create an <var>HTTPConnection</var> to the proxy
itself (i.e. <em>don't</em> use <A
HREF="api/HTTPClient/HTTPConnection.html#setCurrentProxy(java.lang.String, int)">setCurrentProxy()</A>
or <A
HREF="api/HTTPClient/HTTPConnection.html#setProxyServer(java.lang.String, int)">setProxyServer()</A>)
and specify the full URL of the file/article/whatever in the Get(),
Put(), etc. Example:  if you want to retrieve the file
<var>/pub/README</var> via ftp from <var>rtfm.mit.edu</var> then you
could use something like:
<PRE>
    HTTPConnection proxy = new HTTPConnection("my.proxy.dom", 8000);
    HTTPResponse   resp  = proxy.Get("ftp://rtfm.mit.edu/pub/README");
    ...
</PRE>


<H2><A NAME="timeouts">Timeouts</A></H2>

<P>Sometimes one doesn't want to wait (almost) forever until a connection
is established to the server or until the server answers. In this case a
timeout can be set using the methods <A
HREF="api/HTTPClient/HTTPConnection.html#setDefaultTimeout(int)">HTTPConnection.setDefaultTimeout()</A>
and <A
HREF="api/HTTPClient/HTTPConnection.html#setTimeout(int)">HTTPConnection.setTimeout()</A>.
Setting a timeout will cause the client to limit the time it will spend
trying to get the hosts IP-address and establishing a connection with the
server, and it will also set the timeout on the socket while reading the response.

<P><strong>Note:</strong> if a timeout occurs while reading the response
(i.e. an <var>InterruptedIOException</var> is thrown) then doing another
<code>read()</code> may or may not lose data, depending on what streams have
been pushed on the response input stream. In general all HTTPClient streams are
restartable, but those from the JDK aren't (e.g GZIPInputStream). This means
that if such a stream has been pushed onto the stack (typically when a
compression content-coding or transfer-coding was used) and an
<var>InterruptedIOException</var> occurs then chances are the response data
will be corrupted.


<H2><A NAME="out_streams">Output Streams</A></H2>

<P>All request methods which may send data (such as POST and PUT) take
either a byte[] or an <A
HREF="api/HTTPClient/HttpOutputStream.html">HttpOutputStream</A>. Using
the latter may be beneficial when sending large amounts of data because
it streams the data directly over the socket in many cases. However,
restrictions on many servers may prevent the streaming, necessitating
buffering of the data by the HttpOutputStream. The class documentation
gives the full details, but in summary here are the things to consider:
if you know the length of the data beforehand then use the appropriate
constructor and you will be able to stream the data to any server. If
you do not know the length beforehand then you may encounter problems,
in which case you'll need to set the java system property
<var>HTTPClient.dontChunkRequests</var> to <var>true</var>, and all data
will be buffered by the client and only sent when the stream is
close()'d.

<P>Various modules (such as the RedirectionModule and AuthorizationModule)
need to resend certain requests in order to do their job. However, if the
request used an HttpOutputStream then they cannot do so; in such cases the
application will have to handle those statuses itself. In order to
partially solve this problem, an application may set the java system
property <var>HTTPClient.deferStreamed</var> to true. Requests which need
resending (and which used an HttpOutputStream) are then remembered by the
modules and the <A
HREF="api/HTTPClient/HTTPResponse.html#retryRequest()">retry flag</A> in
the response is set to true. The application can then check this flag, and
if it's true just blindly resend the request; the modules will detect the
resend and apply the appropriate modifications (such as adding the
necessary Authorization header field, or changing the request-uri to a new
location).


<H2><A NAME="contexts">Contexts</A></H2>

<P>By default all authorization info, cookies, permanent redirection
lists, etc. are shared between all instance of <var>HTTPConnection</var>.
However, there are cases where one wants to simulate multiple independent
clients or users within the same application. To enable this the
HTTPClient a supports a concept called contexts. Each
<var>HTTPConnection</var> instance has an associated context, which can be
any Object at all, and which can be set using (<A
HREF="api/HTTPClient/HTTPConnection.html#setContext(java.lang.Object)">HTTPConnection.setContext()</A>).
All instances of <var>HTTPConnection</var> which have the same context
will share all info on cookies, authorization info, etc. This is done by
having each <A HREF="#modules">module</A> which keeps information on
behalf of the application (such as the cookie module, the authorization
module and the redirection module) use a separate list for each context.
By setting the context for each HTTPConnection instance appropriately, you
can control exactly which instances share data and which don't.

<P>Note that you can't have more than one context per HTTPConnection,
i.e. you need at least one separate instance of HTTPConnection for each
client.

<P>Example: suppose you want to simulate 3 independent clients, each
having two HTTPConnection instances. Then you could do something like:

<pre>
    HTTPConnection cl1_con1 = new HTTPConnection(...);
    HTTPConnection cl1_con2 = new HTTPConnection(...);
    cl1_con1.setContext("client1");
    cl1_con2.setContext("client1");

    HTTPConnection cl2_con1 = new HTTPConnection(...);
    HTTPConnection cl2_con2 = new HTTPConnection(...);
    cl2_con1.setContext("client2");
    cl2_con2.setContext("client2");

    HTTPConnection cl3_con1 = new HTTPConnection(...);
    HTTPConnection cl3_con2 = new HTTPConnection(...);
    cl3_con1.setContext("client3");
    cl3_con2.setContext("client3");
</pre>

<P>As mentioned above, the context may be any object at all. So, if in the
above example you were going to run each client in a separate thread then
you could use the thread itself as the context (e.g.
<code>con.setContext(Thread.currentThread());</code>).  If no context is
set a default context is used which is the same for all
<var>HTTPConnection</var>'s. Therefore applications which don't need this
feature can just ignore it.


<H2><A NAME="pers_con">Persistent Connections (Keep-Alive's)</A></H2>

<P>The Hypertext Transfer Protocol originally allowed only one request per
TCP connection. However, establishing a TCP connection is fairly expensive
time wise, so that some implementors of HTTP/1.0 added so called
<var>Keep-Alive's</var> to keep a connection open after a request was
completed and to allow further requests to be made over that connection.
Unfortunately, this was not well defined and is broken in the face of
proxies. HTTP/1.1 defines persistent connections correctly and even
makes them the default.

<P>The HTTPClient will by default try to keep a connection alive for
as many requests as possible, both when talking to HTTP/1.0 and
HTTP/1.1 servers. To disable persistent connections you can
specify a <var>Connection</var> header with the value <var>close</var>.
Example:

<pre>
    NVPair[] def_hdrs = { new NVPair("Connection", "close") };
    con.setDefaultHeaders(def_hdrs);
</pre>

<P>This will disable persistent connections for all future request
(unless overridden by a connection header on the request method call).

<P>Keeping the socket connection to the server open after a request is
fine as long as another request follows within a short period of time.
However when you are done you should let the library know by passing the
above <code>"Connection: close"</code> header with the last request.
Furthermore, to limit the length of time the connection will be held
open a timer is started after each request which will close the socket
connection if no further requests arrive within the next 60 seconds.

<P>Note that most of this is transparent as far as the functioning of
the requests is concerned; the only differences you will notice is in
the time required for a request to be sent. Also note that persistent
connections are only done within the context of a given instance of
<var>HTTPConnection</var>; so if you create two instances both pointing
at the same server then they will create separate connections to the
server.


<H2><A NAME="close_con">Closing Connections</A></H2>

<P>The HTTPClient will close a socket as soon as it has determined that
no more data will arrive on the socket or that any outstanding data is
to be discarded. However, this is not always easy to do, and in at least
one common case the client will keep a connection open indefinitely
unless a simple precaution is taken (see below).

<P>A socket is closed when one of the five following conditions occurs:
<UL>
<LI> an exception occurs during read or write.
<LI> the connection timeout triggers and no responses are outstanding
     (note this is <em>not</em> the timeout set by <A
     HREF="api/HTTPClient/HTTPConnection.html#setTimeout(int)">HTTPConnection.setTimeout()</A>,
     but an internal, hard coded 60 second timeout).
<LI> all response streams on the connection have been closed, up to
     and including a terminal response.
<LI> <A
     HREF="api/HTTPClient/HTTPConnection.html#stop()">stop()</A> is invoked
     on the HTTPConnection.
<LI> the stream demultiplexer is finalized.
</UL>

<P>A "terminal" response is a response for which client has determined
that the connection should be closed after it has been fully received.
This includes the server sending a <code>"Connection: close"</code> (in
the case of an HTTP/1.1 server) or not sending a <code>Connection:
keep-alive</code> (in the case of an HTTP/1.0 server) with the
response, the response having no Content-length and no self-delimiting
body, or the receipt of certain error status codes.

<P>As can be seen from above, a basic condition for the connection to
be closed is that the response streams have been closed. This is
necessary because the client can't distinguish between a slow
application which takes its time to read the response and an
application which is not interested in the response. Failing to close
the response streams can lead to a lot of open connections hanging
around, which will eventually cause you to run out of file descriptors
(on a Unix box - socket descriptors on Windoze). Therefore it is
important that you <STRONG>always close the response stream</STRONG> of
every response. This can be done either by invoking
<code>resp.getData()</code> or invoking <code>close()</code> on the
stream from <code>resp.getInputStream()</code> (where <var>resp</var>
is the HTTPResponse). If you are not interested in the response data
you can just do a <code>resp.getInputStream().close()</code>.



<H2><A NAME="pipelining">Pipelining</A></H2>

<P>If the connection is kept open across requests then the requests may
be pipelined. Pipelining here means that a new request is sent before
the response to a previous request is received. Since this can
obviously enhance performance by reducing the overall round-trip time
for a series of requests to the same server, the HTTPClient has been
written to support pipelining (at the expense of some extra code to
keep track of the outstanding requests).

<P>The programming model is always the same: for every request you send
you get a response back which contains the headers and data of the
servers response. Now, to support pipelining, the fields in the
response aren't necessarily filled in yet when the HTTPResponse object
is returned to the caller (i.e. the actual response headers and data
haven't been read off the net), but the first call to any method in the
response (e.g. a <var>getStatusCode()</var>) will wait till the
response has actually been read and parsed. Also any previous requests
will be forced to read their responses if they have not already done so
(so e.g.  if you send two consecutive requests and receive responses
<var>r1</var> and <var>r2</var>, calling
<code>r2.getHeader("Content-type")</code> will first force the complete
response <var>r1</var> to be read before reading the response
<var>r2</var>). All this should be completely transparent, except for
the fact that invoking a method on one response may sometimes take a
few seconds to complete, while the same method on a different response
will return immediately with the desired info.


<H2><A NAME="conn_details">Relationship between HTTPConnection instances
    and TCP connections</A></H2>

<P>The relationship between an HTTPConnection instance and actual TCP
connections (instances of java.net.Socket) is as follows: each
HTTPConnection instance creates its own sockets, i.e. sockets are not
shared between instances. Sockets are created on demand, just before a
request is sent (so creating an HTTPConnection instance does
<em>not</em> immediately cause a TCP connection to be opened). A new
socket is only created if either the current socket (if any) has been
closed, or the previous request/response over it has determined that it
will be closed at the end that response.

<P>Some implications of the above are that 1) applications can control
the number of parallel TCP connections to some degree by creating
multiple instances of HTTPConnection pointing to the same server, but 2)
that control is not absolute: if the server does not support keep-alive's
and the application sends requests before having the read the response
to the previous request, then you may end up with multiple parallel TCP
connections from the same HTTPConnection instance.


<H2><A NAME="threading">Multithreading</A></H2>

<P>The HTTPClient is completely multithread safe (or at least should be
;-). A single HTTPConnection instance may be shared between multiple
threads, multiple requests may be run concurrently on a single or
multiple HTTPConnection instances, and you may even have multiple
threads reading on a single response (though I'm not sure that the
latter makes much sense). Because a number of parameters are on a per
HTTPConnection instance basis (such as the module list, proxy settings,
default headers, etc), care should be taken when modifying these
parameters if the HTTPConnection instance is shared among multiple
threads.


<H2><A NAME="urlcon">HttpURLConnection</A></H2>

<P>The <A HREF="api/HTTPClient/HttpURLConnection.html">HttpURLConnection</A>
class provides a java.net.HttpURLConnection compatible interface to
the HTTPClient. It exists mainly to allow the HTTPClient to be used as
a direct plug-in replacement for Sun's HttpClient (part of the JDK); if you
are writing HTTPClient specific code, then I recommend using the HTTPClient's
"native" API, i.e. <var>HTTPConnection</var> et. al.

<P>The HttpURLConnection follows the model of
<var>java.net.URLConnection</var> and
<var>java.net.HttpURLConnection</var> in that each instance is only
capable of doing a single request (i.e. a new instance must be created
for each request, usually via <code>URL.openConnection()</code> or
<code>URL.openStream()</code>). To enable reuse of connections,
HttpURLConnection manages a pool of HTTPConnection's internally.
Different instances of HttpURLConnection will use the same
HTTPConnection if they have the same host and port.  This way socket
connections are reused as much as possible (i.e. you still get to take
advantage of the persistent connections capability).  In addition, <A
HREF="api/HTTPClient/HttpURLConnection.html#connect()">connect()</A> will
only send the request but not read or parse the response; only when a
method such as <code>getInputStream()</code>,
<code>getResponseCode()</code>, etc, is invoked is the response read
and parsed. This, together with the reuse of HTTPConnection's, allows
requests to be pipelined by explicitly invoking
<code>connect()</code>.  Example:

<PRE>
    URLConnection con1 = url1.openConnection();
    con1.connect();				// sends first request
    URLConnection con2 = url2.openConnection();
    con2.connect();				// sends second request
    InputStream is1 = con1.getInputStream();	// waits for first response
    InputStream is2 = con2.getInputStream();	// waits for second response
</PRE>


<H2><A NAME="prot_vers">Protocol Version</A></H2>

<P>The request protocol version sent is always HTTP/1.1, except in a
few circumstances when a broken server is encountered, in which case
the version sent reverts to HTTP/1.0. An HTTP/0.9 request is never
sent.

<P>The protocol version returned by the server is used to select
between different mechanisms for persistent connections. If the server
advertises itself as being HTTP/1.1 compliant then HTTP/1.1 persistent
connections are used; otherwise HTTP/1.0 keep-alive's are used (the
difference is the tokens used in the Connection header for signaling
persistence and the end of a connection). Apart from this, the only
other distinction made between talking to an HTTP/1.0 or an HTTP/1.1
server is in how request are automatically retried (retried requests
with a body need to use slightly different mechanisms for determining
how long to wait after sending the headers, before sending the body).


<H2><A NAME="modules">Modules</A></H2>

<P>A number of functions of the HTTPClient, such as authorization and
redirection handling, are broken out into <A
HREF="api/HTTPClient/HTTPClientModule.html">modules</A>. Modules can be
dynamically added and removed, thereby allowing certain functionality to
be disabled or new functionality to be added without having to modify
any of the core code. Each connection has a list of modules it will use
to process a request and response. When a request method is invoked
(such as <code>Get()</code> or <code>Post()</code>) the request is first
assembled into a <A HREF="api/HTTPClient/Request.html">Request</A>
instance. Then the <A
HREF="api/HTTPClient/HTTPClientModule.html#requestHandler(HTTPClient.Request,
HTTPClient.Response[])">request handler</A> of each module is invoked in
turn with this request. This handler may modify the request (such as add
headers) or even generate a response directly (such as a cache might
do). Only after all handlers have been invoked (and none of them
generated a response) is the request actually sent over the wire.
Similarly, when a response is read the <A
HREF="api/HTTPClient/HTTPClientModule.html#responsePhase1Handler(HTTPClient.Response,
HTTPClient.RoRequest)">response handlers</A> in each module are invoked
in turn. They may do certain things based on the status code (such as
the redirection module) or the headers (such as the cookie module),
modify the response, or even generate a new request (such as in the
redirection and authentication modules). If a new request is generated
the process starts from the top again.

<P>The currently supplied modules are the
<var>AuthorizationModule</var>, the <var>RedirectionModule</var>, the
<var>ContentEncodingModule</var>, the <var>TransferEncodingModule</var>,
the <var>CookieModule</var>, the <var>ContentMD5Module</var>, the
<var>RetryModule</var> and the <var>DefaultModule</var>. These are
explained in more detail further down.

<P>Modules can be dynamically added and removed to tailor the request
and response processing desired. The methods <A
HREF="api/HTTPClient/HTTPConnection.html#addDefaultModule(java.lang.Class, int)">HTTPConnection.addDefaultModule()</A>,
<A
HREF="api/HTTPClient/HTTPConnection.html#removeDefaultModule(java.lang.Class)">HTTPConnection.removeDefaultModule()</A>
and <A
HREF="api/HTTPClient/HTTPConnection.html#getDefaultModules()">HTTPConnection.getDefaultModules()</A>
manipulate and return the list of default modules which is used when a
new <A HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>
is created. Similarly, the methods <A
HREF="api/HTTPClient/HTTPConnection.html#addModule(java.lang.Class, int)">HTTPConnection.addModule()</A>,
<A
HREF="api/HTTPClient/HTTPConnection.html#removeModule(java.lang.Class)">HTTPConnection.removeModule()</A>
and <A
HREF="api/HTTPClient/HTTPConnection.html#getModules()">HTTPConnection.getModules()</A>
manipulate and return the list of current modules for an HTTPConnection
instance. Note that for the modules which are not public you must get their
Class objects via <var>Class.forName()</var> (e.g. to remove the redirection
module, use
<code>con.removeModule(Class.forName("HTTPClient.RedirectionModule"))</code>).

<P>The default list of modules is initialized from the java system property
<var>HTTPClient.Modules</var>. This property must be a "|" separated list
of class names. If this property is not set it defaults to all the classes
listed below. Normally if during class initialization any module in the
list does not exist or cannot be instantiated then an Error is thrown.
However, if this is being used in an unsigned applet then the error is
suppressed.  This way applets can limit the modules loaded over the net by
simply not providing them (remember, they can't set system properties due to
security restrictions).

<P>You may create your own modules and add them using the above methods.
Any module you write must implement the <A
HREF="api/HTTPClient/HTTPClientModule.html">HTTPClientModule</A>
interface. See the API docs more info. <strong>Note:</strong> this
interface may change. If you write a module and find the interface
insufficient or difficult, please contact me. Also, if you write a module
which you think might be of general usefulness and would like to make it
freely available, let me know.

<P>Here is a short description of each module.


<H3><A NAME="auth_mod">AuthorizationModule</A></H3>

<P>Authorization briefly described in <A HREF="getting_started.html#auth">
Getting Started</A>. As mentioned, this module will handle both server and
proxy authorization requests (status codes 401 and 407). In addition to the
'Basic' and 'Digest' authorization schemes, the AuthorizationModule can be
made to handle other schemes as well, so long as they are not "too exotic"
(i.e. they follow the simple challenge-response mechanism outlined in the
http specs); this is done by setting your own <A
HREF="api/HTTPClient/AuthorizationHandler.html">AuthorizationHandler</A>.
Of course, if you need to something more sophisticated you can always plug
in your own authorization module.

<P>When confronted with an authorization request the authorization module
will query all known authorization info for a possible candidate (the
match must be for the host, port, scheme and realm). If no suitable info
is found, or if the server rejects any info found, an authorization
handler is called to try and get the necessary info from the user; if the
user does not give any information, or if the information she gives is
also rejected, then the retrying is terminated and the last failure
status returned to the caller. The default handler currently only
understands requests for the 'Basic' and 'Digest' authorization schemes;
you may however set your own handler via the <A
HREF="api/HTTPClient/AuthorizationInfo.html#setAuthHandler(HTTPClient.AuthorizationHandler)">AuthorizationInfo.setAuthHandler()</A>
method. The handler given must implement the <A
HREF="api/HTTPClient/AuthorizationHandler.html">AuthorizationHandler</A>
interface. To disable the handler completely give <var>null</var> for
the handler.

<P>A server (or proxy) may send multiple authorization challenges in the
response, in which case the above algorithm is modified to go through the
list of challenges in the same order as they were sent, trying to get
authorization info for each challenge in the list and going to the next
challenge if either no info was found or the server rejects that info. If
the end of the list is reached without achieving authorization then the
authorization handler is called on each challenge (in the same order)
until either an authorization request is successful, the authorization
handler returns null (e.g. when the Cancel button in the default popup
box is activated) or the list is exhausted, in which case the response to
the last failed request is returned.

<P>Whenever the <A HREF="api/HTTPClient/DefaultAuthHandler.html">default
authorization handler</A> needs to ask the user for a username and
password, it queries the current <A
HREF="api/HTTPClient/AuthorizationPrompter.html">AuthorizationPrompter</A>.
This is responsible for popping up a dialog box or otherwise acquiring
the necessary info. A default prompter is provided which will pop up a
simple dialog box, if the AWT is running, or use a command line prompt
if it isn't and the currently running system is Unix or VMS; you may set
your own prompter using <A
HREF="api/HTTPClient/DefaultAuthHandler.html#setAuthorizationPrompter(HTTPClient.AuthorizationPrompter)">DefaultAuthHandler.setAuthorizationPrompter()</A>.
To prevent the prompter from being queried (e.g. to prevent the popup
box from appearing) you can either set the prompter to null, or use <A
HREF="api/HTTPClient/HTTPConnection.html#setAllowUserInteraction(boolean)">con.setAllowUserInteraction(false)</A>.


<H3><A NAME="redir_mod">RedirectionModule</A></H3>

<P>This module handles the redirection status codes 301, 302, 303, 305
and 307. 301 and 307 responses are only redirected if the request method
was GET or HEAD; this is because redirecting, say, a POST blindly might
lead to undesired behaviour, as the circumstances leading to the POST
might have changed. 302 and 303 are treated identically: the new request
to the new location is done using GET (this is what many cgi's expect -
they are basically directing you to a prefabricated answer). 305 is only
honored if the connection is not already using a proxy.

<P>This module also keeps a list of permanently redirected URL's (status
code 301) and will preemptively redirect requests for these. This list is
volatile (i.e. it will be lost when the application exits).


<H3><A HREF="api/HTTPClient/CookieModule.html" NAME="cookie_mod">CookieModule</A></H3>

<P>This module implements cookies as defined by <A
HREF="http://home.netscape.com/newsref/std/cookie_spec.html">Netscape's
cookie spec</A> and the latest <A
HREF="http://www.ietf.org/rfc/rfc2965.txt">HTTP State Management
Mechanism</A> spec. During response processing it intercepts and parses
the Set-Cookie and Set-Cookie2 header fields (removing them from the
response in the process), and during request processing it adds all
eligible cookies to the requests.

<P>Because of privacy issues surrounding cookies, whenever the server
tries to set a cookie a <A
HREF="api/HTTPClient/CookiePolicyHandler.html">policy handler</A> is
invoked to see whether this cookie should be accepted. The default handler
brings up a popup describing the cookie and allowing the user to accept or
reject it; the user may also summarily accept or reject cookies from whole
domains. You may substitute your own policy handler using the <A
HREF="api/HTTPClient/CookieModule.html#setCookiePolicyHandler(HTTPClient.CookiePolicyHandler)">setCookiePolicyHandler()</A>
method. If you set the handler to <var>null</var> then all cookies will be
silently accepted. If you do not want any cookies to be accepted then
either remove the CookieModule from the list of modules or set your own
handler which always returns <var>false</var>. The handler must implement
the <A
HREF="api/HTTPClient/CookiePolicyHandler.html">CookiePolicyHandler</A>
interface.

<P>The CookieModule provides methods to view the currently stored cookies,
and to add and remove cookies from that list. If you want to manually provide
additional cookies for a request then you must add them to the CookieModule's
internal list - any Cookie header fields you create yourself and pass to
a request will be removed by the CookieModule.

<P>The CookieModule will save and restore cookies at the start and end of
the application only if the java system property
<var>HTTPClient.cookies.save</var> is set to <var>true</var>. All cookies
which are meant to persist across invocations are then saved to a file
when the application exits and read from that file when the CookieModule
is loaded. The format of the file is a serialized Hashtable of Cookies.
The name of file can controlled through the system property
<var>HTTPClient.cookies.jar</var>. If this is not set, a system dependent
name is used:
<UL>
<LI>Unix: <var>$HOME/.httpclient_cookies</var>
<LI>Mac: <var>System Folder:Preferences:HTTPClientCookies</var>
<LI>OS/2: <var>$HOME/.httpclient_cookies</var>
<LI>Win95: <var>$JAVA_HOME/.httpclient_cookies</var>
<LI>WinNT: <var>$HOME/.httpclient_cookies</var>
</UL>

(where $HOME is the value of the java system property "user.home", and
$JAVA_HOME is the value of the java system property "java.home").


<H3><A NAME="ce_mod">ContentEncodingModule</A></H3>

<P>Servers may apply various content encodings to the content. The most widely
used encodings are compressions: <var>gzip</var>, <var>compress</var>, and
<var>deflate</var>. This module will handle these three content encodings by
pushing an appropriate decoding stream; this means that the data read from <A
HREF="api/HTTPClient/HTTPResponse.html#getInputStream()">getInputStream()</A>
will be the clear text. The <var>Content-Encoding</var> header is also modified
appropriately.

<P>You might want to consider disabling this module if you are using the
HTTPClient for things like web-copying where storing the compressed
document makes sense.


<H3><A NAME="te_mod">TransferEncodingModule</A></H3>

<P>This is similar to the ContentEncoding module except that it applies to
transfer encodings. It also handles the <var>gzip</var>, <var>compress</var>,
and <var>deflate</var> encodings by pushing an appropriate decoding stream
onto the response input stream and modifies the Transfer-Encoding header
appropriately.


<H3><A NAME="md5_mod">ContentMD5Module</A></H3>

<P>Some servers may generate a <var>Content-MD5</var> header which
contains an MD5 hash of the message body (after any content encoding, but
before any transport encoding is applied). If this header is present this
module will push a stream which calculates the MD5 hash of the body. When
the stream is closed or the end of the data is reached the calculated
hash is compared to the one in the <var>Content-MD5</var> header and if
they don't match an <var>IOException</var> is thrown.

<H3><A NAME="def_mod">DefaultModule</A></H3>

<P>This handles the response status codes 408 (request timeout) and 411
(length required).


<H3><A NAME="retry_mod">RetryModule</A></H3>

<P>This module is special. It is responsible for automatically retrying
requests which were aborted due to an IOException on the socket. It is
unlike other modules in that it is closely tied in to the core code,
instead of just manipulating the request and response structures as
other modules do. The code in this module could have been put in with the
rest of the core code, but moving it to a module has the advantage that
this automatic retrying of requests may be disabled using the standard
mechanism of removing modules.


<H3>Ordering the Modules</H3>

<P>The handlers in the modules are invoked in the order the modules are
placed in the list. Because of certain constraints between modules this
order is important. The default order for the supplied modules is:

<OL>
<LI>RetryModule
<LI>CookieModule
<LI>RedirectionModule
<LI>AuthorizationModule
<LI>DefaultModule
<LI>TransferEncodingModule
<LI>ContentMD5Module
<LI>ContentEncodingModule
</OL>

However the constraints impose only a partial ordering, so that the
above order may be changed as long as the following restrictions are
observed:

<UL>
<LI>The RetryModule <em>must</em> be first. It catches a special
    <var>RetryException</var>, which is thrown by the core code. If it
    were not the first module it would not be able to catch this
    exception.
<LI>The TransferEncoding module must precede any module which handles
    the response message (such as the ContentEncoding module and the
    ContentMD5Module). There are a number of reasons for this, but
    basically it's because all the headers refer to the message before
    transport encoding is applied.
<LI>The ContentMD5 module must precede the ContentEncoding module, as
    the hash is calculated for the encoded message.
</UL>


<H2><A NAME="properties">Java System Properties recognized by HTTPClient</A></H2>

<P>There are a number of java system properties which are used by the
HTTPClient.  Most are documented somewhere in the api docs. Some of the
properties may contain a list of elements, in which case the elements are
separated by vertical bars ("|"). White space is ignored, except that a
<tt>"| |"</tt> produces an empty element whereas <tt>"||"</tt> is treated
like a single delimiter (i.e. <tt>"|"</tt>).

<P>The properties are read once, when the class that uses them is loaded
(i.e. they're read in the class' static initializer). This means they need
to be set either on the command line via the -D option, as in
<PRE>
    java -DHTTPClient.dontChunkRequests=true MyApp </PRE>
or early in your application before the class is first accessed:
<PRE>
    System.getProperties().put("HTTPClient.dontChunkRequests", "true");
    ...
    HTTPConnection.setDefaultHeaders(...); </PRE>

<P>Here is summary of all properties recognized:

<DL>
<DT>http.proxyHost
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify the http proxy to use. See <A
	HREF="api/HTTPClient/HTTPConnection.html#setProxyServer(java.lang.String, int)">setProxyServer()</A>
	for more info. This is the same property that is used by Sun's JDK
	1.1 (and later).

<DT>http.proxyPort
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify the http proxy to use. See <A
	HREF="api/HTTPClient/HTTPConnection.html#setProxyServer(java.lang.String, int)">setProxyServer()</A>
	for more info. This is the same property that is used by Sun's JDK
	1.1 (and later).

<DT>proxySet
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Obsolete. Used by Sun's JDK 1.0.2. If <var>http.proxyHost</var>
	is not set and <var>proxySet</var> is true, then the default
	proxy is set using the values in <var>proxyHost</var> and
	<var>proxyPort</var>.

<DT>proxyHost
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Obsolete. Used by Sun's JDK 1.0.2. If <var>http.proxyHost</var>
	is not set and <var>proxySet</var> is true, then the default
	proxy is set using the values in <var>proxyHost</var> and
	<var>proxyPort</var>.

<DT>proxyPort
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Obsolete. Used by Sun's JDK 1.0.2. If <var>http.proxyHost</var>
	is not set and <var>proxySet</var> is true, then the default
	proxy is set using the values in <var>proxyHost</var> and
	<var>proxyPort</var>.

<DT>HTTPClient.nonProxyHosts
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify a list hosts for which no http proxy is to be
	used. See <A
	HREF="api/HTTPClient/HTTPConnection.html#dontProxyFor(java.lang.String)">dontProxyFor()</A>
	for more info.

<DT>http.nonProxyHosts
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify a list hosts for which no http proxy is to be
	used. See <A
	HREF="api/HTTPClient/HTTPConnection.html#dontProxyFor(java.lang.String)">dontProxyFor()</A>
	for more info. This is the same property that is used by Sun's
	JDK 1.1 (and later).

<DT>HTTPClient.socksHost
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify the SOCKS proxy host. See <A
	HREF="api/HTTPClient/HTTPConnection.html#setSocksServer(java.lang.String, int, int)">setSocksServer()</A>
	for more info.

<DT>HTTPClient.socksPort
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify the SOCKS proxy port. See <A
	HREF="api/HTTPClient/HTTPConnection.html#setSocksServer(java.lang.String, int, int)">setSocksServer()</A>
	for more info.

<DT>HTTPClient.socksVersion
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to specify the SOCKS proxy version. See <A
	HREF="api/HTTPClient/HTTPConnection.html#setSocksServer(java.lang.String, int, int)">setSocksServer()</A>
	for more info.

<DT>HTTPClient.Modules
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to define the default list of modules. See <A
	HREF="api/HTTPClient/HTTPConnection.html#addDefaultModule(java.lang.Class, int)">HTTPConnection.addDefaultModule()</A>
	for more info.

<DT>HTTPClient.disable_pipelining
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to disable all pipelining. This should never be needed, but
	you may encounter a server which displays problems when pipelining
	requests. Setting this property to <var>true</var> will cause
	the HTTPClient to stall each request until the headers from the
	response to the previous request have been received and parsed.

<DT>HTTPClient.disableKeepAlives
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to disable keep-alive's. This exactly equivalent to (and is
	implemented as) <code>con.setDefaultHeaders(new NVPair[] { new NVPair("Connection", "close") })</code>.

<DT>HTTPClient.forceHTTP_1.0
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to force the client to pretend it's only capable of HTTP/1.0.
	Setting this to <var>true</var> causes all requests to be sent as
	if the server could only handle HTTP/1.0, and causes the HTTP-version
	in the request-line to be sent as HTTP/1.0.

<DT>HTTPClient.dontChunkRequests
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	Used to prevent the client from using the chunked transfer encoding
	on requests. Setting this to <var>true</var> causes all data written
	to an HttpOutputStream (that was not constructed with a content
	length) to be buffered and only sent when the stream is closed.

<DT>HTTPClient.deferStreamed
    <DD>Read by <A
	HREF="api/HTTPClient/HTTPConnection.html">HTTPConnection</A>.
	When a request is sent which uses an <A
	HREF="api/HTTPClient/HttpOutputStream.html">HttpOutputStream</A>, and
	handling of the response by modules would require the module to resend
	the request, then modules have to punt because they do not have the
	request data (i.e. the data written to the output stream). In order to
	make things easier on the application, an application can set this
	property to true and then check the <A
	HREF="api/HTTPClient/HTTPResponse.html#retryRequest()"> retry
	flag</A> in the response for whether it needs to resend the
	request. If the value is false, the retry flag will always be
	false. The default is false in order to prevent memory leaks in
	applications which aren't prepared for it.

<DT>HTTPClient.dontTimeoutRespBody
    <DD>Read by RespInputStream. Used to restore pre-V0.3-3 behaviour of
        disabling any timeout while reading the response body. Setting this
	to true causes read()'s on the response input stream to block
	indefinitely until some data arrives or the connection closes,
	irrespective of whether a timeout has been set or not.

<DT>HTTPClient.cookies.hosts.accept
    <DD>Read by <A
	HREF="api/HTTPClient/CookieModule.html">CookieModule</A>.
	Used to initialize the list of hosts and domains from which to
	always accept cookies. See <A
	HREF="api/HTTPClient/CookieModule.html#setCookiePolicyHandler(HTTPClient.CookiePolicyHandler)">setCookiePolicyHandler()</A>
	for more info.

<DT>HTTPClient.cookies.hosts.reject
    <DD>Read by <A
	HREF="api/HTTPClient/CookieModule.html">CookieModule</A>.
	Used to initialize the list of hosts and domains from which to
	always reject cookies. See <A
	HREF="api/HTTPClient/CookieModule.html#setCookiePolicyHandler(HTTPClient.CookiePolicyHandler)">setCookiePolicyHandler()</A>
	for more info.

<DT>HTTPClient.cookies.save
    <DD>Read by <A
	HREF="api/HTTPClient/CookieModule.html">CookieModule</A>.
	If set to true then the CookieModule will read stored cookie at
	startup time, and save cookies that should persist at application
	exit time.

<DT>HTTPClient.cookies.jar
    <DD>Read by <A
	HREF="api/HTTPClient/CookieModule.html">CookieModule</A>.
	This specifies the cookie jar to use for reading and saving
	cookies. If not set, a system dependent name is used.

<DT>HTTPClient.log.file
    <DD>Read by <A HREF="api/HTTPClient/Log.html">Log</A>. If defined, this
	specifies the name of the file to which log messages are to be written.
	By default log messages are written to <var>System.err</var>.

<DT>HTTPClient.log.mask
    <DD>Read by <A HREF="api/HTTPClient/Log.html">Log</A>. If defined, this
	specifies which debug messages should be logged. The value is a number
	which is the bitwise OR ('|') of the values of the facilities for which
	logging should be enabled. E.g. the value -1 enables logging for all
	facilities. By default logging is disabled.

<DT>HTTPClient.HttpURLConnection.AllowUI
    <DD>Read by <A
	HREF="api/HTTPClient/HttpURLConnection.html">HttpURLConnection</A>.
	If set to true then set the defaultAllowUserInteraction to true.
	By default this is set to false in java.net.URLConnection.

<DT>http.agent
    <DD>Read by <A
	HREF="api/HTTPClient/HttpURLConnection.html">HttpURLConnection</A>.
	If set then the "User-Agent" header is set to this property's value.
</DL>


<H2><A NAME="headers">HTTP Header Fields</A></H2>

<P>All request methods accept optional header fields to be sent with the
request.  Here are a list of possible request and response header fields
as defined in the HTTP/1.1 spec. I have added some comments to some of
them, but for further info I recommend getting the <A
HREF="#specs">specs</A> (every header field is described in a paragraph of
its own in the spec, so you can read just the part that interests you and
ignore the rest).


<H3>Request Header Fields</H3>

<ul>
<li><var>Cache-Control</var>
<li><var>Connection</var> - used for persistent connections; need not be
    set by applications
<li><var>Date</var> - use only in PUT/POST; even then optional
<li><var>Keep-Alive</var> - used for persistent connections (HTTP/1.0
    only); set by the HTTPClient as needed.
<li><var>Pragma</var>
<li><var>Transfer-Encoding</var> - set by the client as necessary
<li><var>Upgrade</var>
<li><var>Via</var>
</ul>
<ul>
<li><var>Accept</var>
<li><var>Accept-Charset</var>
<li><var>Accept-Encoding</var> - set by the ContentEncoding module
<li><var>Accept-Language</var>
<li><var>Authorization</var> - generated by HTTPClient as necessary
<li><var>From</var>
<li><var>Host</var> - set by HTTPClient; may be overridden by application
<li><var>If-Modified-Since</var> - the server returns 304 if the document
    was not modified
<li><var>If-Match</var> - the server returns 412 if the match fails
<li><var>If-NoneMatch</var> - the server returns 304 if the match succeeds
<li><var>If-Range</var>
<li><var>If-Unmodified-Since</var> - the server returns 412 if modified
<li><var>Max-Forwards</var> - used with TRACE and OPTIONS
<li><var>Proxy-Authorization</var> - generated by HTTPClient as necessary
<li><var>Range</var>
<li><var>Referer</var>
<li><var>User-Agent</var> - generated by HTTPClient; can be modified
</ul>
<ul>
<li><var>Allow</var> - only with PUT
<li><var>Content-Base</var>
<li><var>Content-Encoding</var> - examples: gzip, compress, deflate
<li><var>Content-Language</var>
<li><var>Content-Length</var> - set by HTTPClient
<li><var>Content-MD5</var>
<li><var>Content-Range</var>
<li><var>Content-Type</var>
</ul>


<H3>Response Header Fields</H3>

<ul>
<li><var>Cache-Control</var>
<li><var>Connection</var> - used for persistent connections
<li><var>Date</var> - date that the document was delivered
    (<strong>not</strong> when it was last modified)
<li><var>Keep-Alive</var> - used for persistent connections (HTTP/1.0 only)
<li><var>Pragma</var>
<li><var>Transfer-Encoding</var> - HTTPClient handles <var>chunked</var>
    in the core code; <var>gzip</var> and <var>deflate</var> are handled
    by the TransferEncoding module.
<li><var>Upgrade</var> - sent with a 101 response; 
<li><var>Via</var> - carries a list of proxies that were involved in
    returning the request; especially interesting for TRACE and OPTIONS
    requests.
</ul>
<ul>
<li><var>Age</var> - from caches
<li><var>Location</var> - sent with a redirection status code;
    since HTTPClient automatically handles 301, 302, 303, 305 and 307 status
    codes this field is seldom seen.
<li><var>Proxy-Authenticate</var> - read by HTTPClient
<li><var>Public</var>
<li><var>Retry-After</var> - optionally sent with a 503 status
<li><var>Vary</var>
<li><var>Warning</var>
<li><var>WWW-Authenticate</var> - read by HTTPClient
</ul>
<ul>
<li><var>Accept-Ranges</var>
<li><var>Allow</var> - with a 405 status
<li><var>Content-Base</var>
<li><var>Content-Encoding</var> - <var>gzip</var>, <var>deflate</var>,
    and <var>compress</var> are handled by the ContentEncoding module.
<li><var>Content-Language</var>
<li><var>Content-Length</var>
<li><var>Content-Location</var>
<li><var>Content-MD5</var> - handled by the ContentMD5 module.
<li><var>Content-Range</var>
<li><var>Content-Type</var>
<li><var>ETag</var>
<li><var>Expires</var> - the date when the document expires
<li><var>Last-Modified</var> - the date the document was last modified
</ul>


<H1><A NAME="specs">Further Reading</A></H1>

<DL>
<DD> <IMG ALT="*" SRC="images/greenball.gif">
     <A HREF="http://www.w3.org/Protocols/">General HTTP Info at W3C</A>
<DD> <IMG ALT="*" SRC="images/greenball.gif">
     <A HREF="http://www.ietf.org/rfc/rfc1945.txt">HTTP/1.0 Spec (RFC 1945)
     </A>
<DD> <IMG ALT="*" SRC="images/greenball.gif">
     <A HREF="http://www.ietf.org/rfc/rfc2616.txt">HTTP/1.1 Spec (RFC 2616)
     </A>
<DD> <IMG ALT="*" SRC="images/greenball.gif">
     <A HREF="http://www.ietf.org/rfc/rfc2617.txt">Digest Authentication Spec (RFC 2617)
     </A>
</DL>


<P>
<A HREF="index.html">
<IMG SRC="images/back.gif" ALT="[HTTPClient]"></A>
<HR>

<ADDRESS>
Ronald Tschal&auml;r / 6. May 2001 /
<A HREF="mailto:ronald@innovation.ch">ronald@innovation.ch</A>.
</ADDRESS>

</BODY>

</HTML>

